Linguistic Redundancy in Twitter

نویسندگان

  • Fabio Massimo Zanzotto
  • Marco Pennacchiotti
  • Kostas Tsioutsiouliklis
چکیده

In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of microblogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual Entailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Understanding U.S. regional linguistic variation with Twitter data analysis

We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013 – Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolutio...

متن کامل

Information Overload, Similarity, and Redundancy: Unsubscribing Information Sources on Twitter

The emergence of social media has changed individuals’ information consumption patterns. The purpose of this study is to explore the role of information overload, similarity, and redundancy in unsubscribing information sources from users’ information repertoires. In doing so, we randomly selected nearly 7,500 ego networks on Twitter and tracked their activities in 2 waves. A multilevel logistic...

متن کامل

Analyzing the Dynamic Evolution of Hashtags on Twitter: a Language-Based Approach

Hashtags are used in Twitter to classify messages, propagate ideas and also to promote specific topics and people. In this paper, we present a linguistic-inspired study of how these tags are created, used and disseminated by the members of information networks. We study the propagation of hashtags in Twitter grounded on models for the analysis of the spread of linguistic innovations in speech c...

متن کامل

Inferring gender of a Twitter user using celebrities it follows

This paper addresses the task of user gender classification in social media, with an application to Twitter. The approach automatically predicts gender by leveraging observable information such as the tweet behavior, linguistic content of the user’s Twitter feed and the celebrities followed by the user. This paper first evaluates linguistic content based features using LIWC dictionary and popul...

متن کامل

Ideological Consumerism in Colombian Elections, 2015: Links Between Political Ideology, Twitter Activity, and Electoral Results

Propagation of political ideologies in social networks has shown a substantial impact on voting behavior. Both the contents of the messages (the ideology) and the politicians' influence on their online audiences (their followers) have been associated with such an impact. In this study we evaluate which of these factors exerted a major role in deciding electoral results of the 2015 Colombian reg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011